Hierarchical Segmentation of Videos into Shots and Scenes using Visual Content
نویسندگان
چکیده
With the large amounts of video data available, it has become increasingly important to have the ability to quickly search through and browse through these videos. With that in mind, the objective of this project is to facilitate the process of searching through videos for specific content by creating a video search tool, with an immediate goal of automatically performing a hierarchical segmentation of videos, particularly full-length movies, before carrying out a search for a specific query. We approach the problem by first segmenting the video into its film units. Once the units have been extracted, various similarity measures between features, that are extracted from the film units, can be used to locate specific sections in the movie. In order to be able to properly search through a film, we must first have access to its basic units. A movie can be broken down into a hierarchy of three units: frames, shots, and scenes. The important first step in this process is to partition the film into shots. Shot detection, the process of locating the transitions between different cameras, is executed by performing a color reduction, using the 4-Histograms method to calculate the distance between neighboring frames, applying a second order derivative to the resulting distance vector, and finally using the automatically calculated threshold to locate shot cuts. Scene detection is generally a more difficult task when compared to shot detection. After the shot boundaries of a video have been detected, the next step towards scene detection is to calculate a certain similarity measure which can then be used to cluster shots into scenes. Various keyframe extraction algorithms and similarity measures from the literature were considered and compared. Frame sampling for obtaining keyframe sets and Bhattacharya distance for similarity measure were selected for use in the shot detection algorithm. A binary shot similarity map is then created using the keyframe sets and Bhattacharya distance similarity measure. Next, a temporal distance weight and a predetermined threshold are applied to the map to obtain the final binary similarity map. The last step uses the proposed algorithm to locate the shot clusters along the diagonal which correspond to scenes. These methods and measures were successfully implemented in the Video Search Tool to hierarchically segment videos into shots and scenes.
منابع مشابه
Video Shot Clustering Using Spectral Methods
The automatic segmentation and structuring of videos present technical challenges due to the large variation of content, spatial layout, and possible lack of storyline. In this paper, we propose a spectral method to group video shots into scenes based on their visual similarity and temporal relations. Spectral methods have been shown to be effective in capturing perceptual organization features...
متن کاملStatistical Audio-Visual Data Fusion for Video Scene
Automatic video segmentation into semantic units is important in order to organize an effective content-based access to long video. In this work, we focus on the problem of video segmentation into narrative units called scenes—aggregates of shots unified by a common dramatic event or locale. In this work, we derive a statistical video scene segmentation approach that detects scenes boundaries i...
متن کاملEffect of Global Visual Features on Scene Segmentation for Videos
Automatic video segmentation is the first and necessary step for organizing a long video file into smaller units for subsequent browsing and retrieval. The smallest basic unit is shot–a contiguous sequence of frames recorded from a single camera operation. Relevant shots are then grouped into a high-level unit called scene that conveys some meaning to viewers. In this paper, we first give a str...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملA method and browser for cross-referenced video summaries
We present an automatic tool for compact representation and cross-referencing of long video sequences, which is based on a novel visual abstraction of semantic content. Our highly compact hierarchical representation results from the non-temporal clustering of scene segments into a new conceptual form grounded in the recognition of real-world backgrounds. We represent shots and scenes using mosa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010